-
Notifications
You must be signed in to change notification settings - Fork 13
docs(building_graders): add training reward models guide and update integrations #46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ntegrations - Replace training_grpo.md with comprehensive training_reward_models.md - Add LangSmith integration to mkdocs navigation - Update overview.md links to new training documentation - Refactor langfuse.md and langsmith.md integration docs - Minor fix in sft/README.md
Summary of ChangesHello @XiaoBoAI, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly overhauls and enhances the documentation for OpenJudge, focusing on how users can train judge models and integrate with popular LLM observability platforms. The changes aim to provide a more structured, comprehensive, and user-friendly experience by consolidating training methodologies into a single guide and refining the integration tutorials for Langfuse and LangSmith. This ensures that users have clear, up-to-date resources for leveraging OpenJudge's capabilities. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request significantly improves the documentation by adding a comprehensive guide on training reward models and refactoring the Langfuse and LangSmith integration guides for better clarity and structure. The changes are well-organized and make the documentation more user-friendly. I've left a few suggestions to improve consistency in the new training guide and to fix a code example in the Langfuse integration documentation.
| for i, result in enumerate(grader_results): | ||
| trace_id = trace_id_mapping[i] | ||
| print(f"Sending {grader_name} score for trace {trace_id}") | ||
| send_result_to_langfuse(trace_id, grader_name, result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The send_result_to_langfuse function is called here, but its definition appears later in the document. In Python, a function must be defined before it is called. This will result in a NameError when the code is executed. To make the example runnable, please move the definition of send_result_to_langfuse to be before the batch_evaluate_traces function.
|
|
||
| ### Training Objective | ||
|
|
||
| $$\mathcal{L} = -\sum_{t} \log P(y_t | y_{<t}, x)$$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with other LaTeX formulas in the repository (e.g., the one fixed in cookbooks/training_judge_model/sft/README.md in this same PR), it's better to use \mid for the conditional probability bar and \lt for the less-than symbol. This improves readability of the mathematical notation.
| $$\mathcal{L} = -\sum_{t} \log P(y_t | y_{<t}, x)$$ | |
| $$\mathcal{L} = -\sum_{t} \log P(y_t \mid y_{\lt t}, x)$$ |
| | Parameter | Default | Description | | ||
| |-----------|---------|-------------| | ||
| | `MODEL_PATH` | `./models/Qwen3-14B` | Base model path | | ||
| | `TRAIN_BATCH_SIZE` | `96` | Global batch size | | ||
| | `MICRO_BATCH_SIZE` | `12` | Per-GPU micro batch | | ||
| | `MAX_LENGTH` | `4096` | Maximum sequence length | | ||
| | `SP_SIZE` | `8` | Sequence parallel size | | ||
| | `TOTAL_EPOCHS` | `1` | Training epochs | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MODEL_PATH default value ./models/Qwen3-14B uses a local-style path. This is inconsistent with the GRPO section, which uses a HuggingFace model ID (Qwen/Qwen3-8B). For clarity and consistency across the document, it would be better to use HuggingFace model IDs for all examples. A similar issue exists in the Bradley-Terry configuration table (line 150).
OpenJudge Version
0.2.0
Description
Checklist
Please check the following items before code is ready to be reviewed.
pre-commit run --all-filescommand